Search CORE

20 research outputs found

Comprehensive analysis of normal adjacent to tumor transcriptomes.

Author: Aran Dvir
Butte Atul J
Camarda Roman
Goga Andrei
Krings Gregor
Odegaard Justin
Oskotsky Boris
Paik Hyojung
Sirota Marina
Publication venue: eScholarship, University of California
Publication date: 01/10/2017
Field of study

Histologically normal tissue adjacent to the tumor (NAT) is commonly used as a control in cancer studies. However, little is known about the transcriptomic profile of NAT, how it is influenced by the tumor, and how the profile compares with non-tumor-bearing tissues. Here, we integrate data from the Genotype-Tissue Expression project and The Cancer Genome Atlas to comprehensively analyze the transcriptomes of healthy, NAT, and tumor tissues in 6506 samples across eight tissues and corresponding tumor types. Our analysis shows that NAT presents a unique intermediate state between healthy and tumor. Differential gene expression and protein-protein interaction analyses reveal altered pathways shared among NATs across tissue types. We characterize a set of 18 genes that are specifically activated in NATs. By applying pathway and tissue composition analyses, we suggest a pan-cancer mechanism of pro-inflammatory signals from the tumor stimulates an inflammatory response in the adjacent endothelium

Directory of Open Access Journals

eScholarship - University of California

Recommended from our members

ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data.

Author: Butte Atul J
Datta Debajyoti
Frazier Remi
Giangreco Nicholas
Glicksberg Benjamin S
Larsen Rick
Lee Nelson
Oskotsky Boris
Rudrapatna Vivek
Tatonetti Nicholas P
Thangaraj Phyllis M
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

Objectives:Electronic health record (EHR) data are increasingly used for biomedical discoveries. The nature of the data, however, requires expertise in both data science and EHR structure. The Observational Medical Out-comes Partnership (OMOP) common data model (CDM) standardizes the language and structure of EHR data to promote interoperability of EHR data for research. While the OMOP CDM is valuable and more attuned to research purposes, it still requires extensive domain knowledge to utilize effectively, potentially limiting more widespread adoption of EHR data for research and quality improvement. Materials and methods:We have created ROMOP: an R package for direct interfacing with EHR data in the OMOP CDM format. Results:ROMOP streamlines typical EHR-related data processes. Its functions include exploration of data types, extraction and summarization of patient clinical and demographic data, and patient searches using any CDM vocabulary concept. Conclusion:ROMOP is freely available under the Massachusetts Institute of Technology (MIT) license and can be obtained from GitHub (http://github.com/BenGlicksberg/ROMOP). We detail instructions for setup and use in the Supplementary Materials. Additionally, we provide a public sandbox server containing synthesized clinical data for users to explore OMOP data and ROMOP (http://romop.ucsf.edu)

eScholarship - University of California

Recommended from our members

Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

Author: Butte Atul J
Fan Xuancheng
Glicksberg Benjamin S
Goldstein Theodore
Ludwig Dana
Muenzen Kathleen
Norgeot Beau
Oskotsky Boris
Peterson Thomas A
Rutenberg Eugenia
Schenk Gundolf
Schmajuk Gabriela
Sirota Marina
Yazdany Jinoos
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods

eScholarship - University of California

PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model.

Author: Attali
Atul J Butte
Badgeley
Benjamin S Glicksberg
Bethany Percha
Boris Oskotsky
Chang
Debajyoti Datta
Duke
Estiri
Eugenia Rutenberg
Frankovich
Glicksberg
Hirsch
Hripcsak
Hripcsak
Huser
Jensen
Joel T Dudley
Jonathan Wren
Kipp W Johnson
Krause
Levine
Li Li
Malik
Mandel
Marcus A Badgeley
Mark M Shervey
Nadav Rappoport
Nelson Lee
Nicholas Giangreco
Nicholas P Tatonetti
Perer
Phyllis M Thangaraj
Pivovarov
Rajkomar
Remi Frazier
Riccardo Miotto
Rick Larsen
Rind
Schuemie
Shaddox
Sharat Israni
Sievert
Soulakis
Theodore C Goldstein
Vashisht
Vivek A Rudrapatna
West
Zhang
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

MotivationElectronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge.ResultsWe present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes.Availability and implementationPatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu.Supplementary informationSupplementary data are available at Bioinformatics online

Crossref

eScholarship - University of California

Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research

Author: Amoroso Nicola
Andreoletti Gaia
Bellotti Roberto
Bigcraft Isaac
Bletz Julie
Chen Guanhua
Chung Verena
De Angelis Maria
Flynn Kaitlin J
Gao Jifan
Golob Jonathan L
Ha Connie W
Kosti Idit
Kuntzleman Abigail
Minot Samuel S
Monaco Alfonso
Nelson Amber
Novielli Pierfrancesco
Oskotsky Boris
Oskotsky Tomiko T
Pantaleo Ester
Parraga-Leo Antonio
Roldan Alennie
Romano Donato
Tang Alice S
Tang Zheng-Zheng
Tangaro Sabina
Vacca Mirco
Wei Zhoujingpeng
Wibrand Camilla
Wong Ronald J
Publication venue: Digital Commons @ Michigan Tech
Publication date: 21/12/2023
Field of study

Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; \u3c37 \u3eweeks) or (2) early preterm birth (ePTB; \u3c32 \u3eweeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth

Michigan Technological University

Systematic identification of ACE2 expression modulators reveals cardiomyopathy as a risk factor for mortality in COVID-19 patients.

Author: Butte Atul J
Hu Zicheng
Kaur Navchetan
Oskotsky Boris
Publication venue: eScholarship, University of California
Publication date: 01/01/2022
Field of study

BackgroundAngiotensin-converting enzyme 2 (ACE2) is the cell-entry receptor for SARS-CoV-2. It plays critical roles in both the transmission and the pathogenesis of COVID-19. Comprehensive profiling of ACE2 expression patterns could reveal risk factors of severe COVID-19 illness. While the expression of ACE2 in healthy human tissues has been well characterized, it is not known which diseases and drugs might be associated with ACE2 expression.ResultsWe develop GENEVA (GENe Expression Variance Analysis), a semi-automated framework for exploring massive amounts of RNA-seq datasets. We apply GENEVA to 286,650 publicly available RNA-seq samples to identify any previously studied experimental conditions that could be directly or indirectly associated with ACE2 expression. We identify multiple drugs, genetic perturbations, and diseases that are associated with the expression of ACE2, including cardiomyopathy, HNF1A overexpression, and drug treatments with RAD140 and itraconazole. Our joint analysis of seven datasets confirms ACE2 upregulation in all cardiomyopathy categories. Using electronic health records data from 3936 COVID-19 patients, we demonstrate that patients with pre-existing cardiomyopathy have an increased mortality risk than age-matched patients with other cardiovascular conditions. GENEVA is applicable to any genes of interest and is freely accessible at http://genevatool.org .ConclusionsThis study identifies multiple diseases and drugs that are associated with the expression of ACE2. The effect of these conditions should be carefully studied in COVID-19 patients. In particular, our analysis identifies cardiomyopathy patients as a high-risk group, with increased ACE2 expression in the heart and increased mortality after SARS-COV-2 infection

PubMed Central

eScholarship - University of California

Recommended from our members

Mortality Risk Among Patients With COVID-19 Prescribed Selective Serotonin Reuptake Inhibitor Antidepressants

Author: Aghaeepour Nima
Marić Ivana
Oskotsky Boris
Oskotsky Tomiko
Sirota Marina
Stevenson David K
Tang Alice
Wong Ronald J
Publication venue: eScholarship, University of California
Publication date: 01/11/2021
Field of study

ImportanceAntidepressant use may be associated with reduced levels of several proinflammatory cytokines suggested to be involved with the development of severe COVID-19. An association between the use of selective serotonin reuptake inhibitors (SSRIs)-specifically fluoxetine hydrochloride and fluvoxamine maleate-with decreased mortality among patients with COVID-19 has been reported in recent studies; however, these studies had limited power due to their small size.ObjectiveTo investigate the association of SSRIs with outcomes in patients with COVID-19 by analyzing electronic health records (EHRs).Design, setting, and participantsThis retrospective cohort study used propensity score matching by demographic characteristics, comorbidities, and medication indication to compare SSRI-treated patients with matched control patients not treated with SSRIs within a large EHR database representing a diverse population of 83 584 patients diagnosed with COVID-19 from January to September 2020 and with a duration of follow-up of as long as 8 months in 87 health care centers across the US.ExposuresSelective serotonin reuptake inhibitors and specifically (1) fluoxetine, (2) fluoxetine or fluvoxamine, and (3) other SSRIs (ie, not fluoxetine or fluvoxamine).Main outcomes and measuresDeath.ResultsA total of 3401 adult patients with COVID-19 prescribed SSRIs (2033 women [59.8%]; mean [SD] age, 63.8 [18.1] years) were identified, with 470 receiving fluoxetine only (280 women [59.6%]; mean [SD] age, 58.5 [18.1] years), 481 receiving fluoxetine or fluvoxamine (285 women [59.3%]; mean [SD] age, 58.7 [18.0] years), and 2898 receiving other SSRIs (1733 women [59.8%]; mean [SD] age, 64.7 [18.0] years) within a defined time frame. When compared with matched untreated control patients, relative risk (RR) of mortality was reduced among patients prescribed any SSRI (497 of 3401 [14.6%] vs 1130 of 6802 [16.6%]; RR, 0.92 [95% CI, 0.85-0.99]; adjusted P = .03); fluoxetine (46 of 470 [9.8%] vs 937 of 7050 [13.3%]; RR, 0.72 [95% CI, 0.54-0.97]; adjusted P = .03); and fluoxetine or fluvoxamine (48 of 481 [10.0%] vs 956 of 7215 [13.3%]; RR, 0.74 [95% CI, 0.55-0.99]; adjusted P = .04). The association between receiving any SSRI that is not fluoxetine or fluvoxamine and risk of death was not statistically significant (447 of 2898 [15.4%] vs 1474 of 8694 [17.0%]; RR, 0.92 [95% CI, 0.84-1.00]; adjusted P = .06).Conclusions and relevanceThese results support evidence that SSRIs may be associated with reduced severity of COVID-19 reflected in the reduced RR of mortality. Further research and randomized clinical trials are needed to elucidate the effect of SSRIs generally, or more specifically of fluoxetine and fluvoxamine, on the severity of COVID-19 outcomes

eScholarship - University of California

Comparing Ethnicity-Specific Reference Intervals for Clinical Laboratory Tests from EHR Data

Author: Butte Atul J
Oskotsky Boris
Paik Hyojung
Rappoport Nadav
Tor Ruth
Zaitlen Noah
Ziv Elad
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

BackgroundThe results of clinical laboratory tests are an essential component of medical decision-making. To guide interpretation, test results are returned with reference intervals defined by the range in which the central 95% of values occur in healthy individuals. Clinical laboratories often set their own reference intervals to accommodate variation in local population and instrumentation. For some tests, reference intervals change as a function of sex, age, and self-identified race and ethnicity.MethodsIn this work, we develop a novel approach, which leverages electronic health record data, to identify healthy individuals and tests for differences in laboratory test values between populations.ResultsWe found that the distributions of >50% of laboratory tests with currently fixed reference intervals differ among self-identified racial and ethnic groups (SIREs) in healthy individuals.ConclusionsOur results confirm the known SIRE-specific differences in creatinine and suggest that more research needs to be done to determine the clinical implications of using one-size-fits-all reference intervals for other tests with SIRE-specific distributions

PubMed Central

eScholarship - University of California

Recommended from our members

A certified de-identification system for all clinical text documents for information extraction at scale.

Author: Ashouri Choshali Habibeh
Butte Atul J
Israni Sharat
Muenzen Kathleen
Oskotsky Boris
Plunkett Thomas
Radhakrishnan Lakshmi
Schenk Gundolf
Publication venue: eScholarship, University of California
Publication date: 04/07/2023
Field of study

ObjectivesClinical notes are a veritable treasure trove of information on a patient's disease progression, medical history, and treatment plans, yet are locked in secured databases accessible for research only after extensive ethics review. Removing personally identifying and protected health information (PII/PHI) from the records can reduce the need for additional Institutional Review Boards (IRB) reviews. In this project, our goals were to: (1) develop a robust and scalable clinical text de-identification pipeline that is compliant with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule for de-identification standards and (2) share routinely updated de-identified clinical notes with researchers.Materials and methodsBuilding on our open-source de-identification software called Philter, we added features to: (1) make the algorithm and the de-identified data HIPAA compliant, which also implies type 2 error-free redaction, as certified via external audit; (2) reduce over-redaction errors; and (3) normalize and shift date PHI. We also established a streamlined de-identification pipeline using MongoDB to automatically extract clinical notes and provide truly de-identified notes to researchers with periodic monthly refreshes at our institution.ResultsTo the best of our knowledge, the Philter V1.0 pipeline is currently the first and only certified, de-identified redaction pipeline that makes clinical notes available to researchers for nonhuman subjects' research, without further IRB approval needed. To date, we have made over 130 million certified de-identified clinical notes available to over 600 UCSF researchers. These notes were collected over the past 40 years, and represent data from 2757016 UCSF patients

eScholarship - University of California

Deep phenotyping of Alzheimer's disease leveraging electronic medical records identifies sex-specific clinical associations.

Author: Allen Isabel E
Bicak Mesude
Dubal Dena
Glicksberg Benjamin S
Havaldar Shreyas
Hu Zicheng
Mantyh William G
Oskotsky Boris
Oskotsky Tomiko
Sirota Marina
Solsberg Caroline Warly
Tang Alice S
Woldemariam Sarah
Zeng Billy
Publication venue: eScholarship, University of California
Publication date: 01/02/2022
Field of study

Alzheimer's Disease (AD) is a neurodegenerative disorder that is still not fully understood. Sex modifies AD vulnerability, but the reasons for this are largely unknown. We utilize two independent electronic medical record (EMR) systems across 44,288 patients to perform deep clinical phenotyping and network analysis to gain insight into clinical characteristics and sex-specific clinical associations in AD. Embeddings and network representation of patient diagnoses demonstrate greater comorbidity interactions in AD in comparison to matched controls. Enrichment analysis identifies multiple known and new diagnostic, medication, and lab result associations across the whole cohort and in a sex-stratified analysis. With this data-driven method of phenotyping, we can represent AD complexity and generate hypotheses of clinical factors that can be followed-up for further diagnostic and predictive analyses, mechanistic understanding, or drug repurposing and therapeutic approaches

PubMed Central

eScholarship - University of California